Search CORE

11 research outputs found

Recommended from our members

Few-Shot Natural Language Processing by Meta-Learning Without Labeled Data

Author: Bansal Trapit
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/03/2022
Field of study

Humans show a remarkable capability to accurately solve a wide range of problems efficiently -- utilizing a limited amount of computation and experience. Deep learning models, by stark contrast, can be trained to be highly accurate on a narrow task while being highly inefficient in terms of the amount of compute and data required to reach that accuracy. Within natural language processing (NLP), recent breakthroughs in unsupervised pretraining have enabled reusable models that can be applied to many NLP tasks, however, learning of new tasks is still inefficient. This has led to research on few-shot learning, where the goal is to generalize to new tasks with very few labeled instances. Meta-learning, or learning to learn, treats the learning process itself as a learning problem from data with the goal of learning systems that can generalize to new tasks efficiently. This has the potential to produce few-shot learners that can accurately solve a wide range of new tasks. However, meta-learning requires a distribution over tasks with relevant labeled data that can be difficult to obtain, severely limiting the practical utility of meta-learning methods. In this dissertation, we develop methods to enable large-scale meta-learning from unlabeled text data and improve the few-shot generalization ability of NLP models. We contribute methods that propose tasks synthetically created from unlabeled text, allowing for a large task distribution for meta-learning. This leads to rapid learning of new tasks by meta-learning from millions of self-supervised tasks and minimizes the train-test mismatch in few-shot learning by optimizing the pre-training directly for future fine-tuning with a few examples. Since real-world applications of NLP require learning diverse tasks with different numbers of classes, we first introduce an optimization-based meta-learning method that can learn from multiple NLP classification tasks with any number of classes. We then leverage the proposed self-supervised approach to create meta-training tasks, with a diverse number of classes, and meta-train models for few-shot learning using this task distribution. This leads to better representation learning, learning key hyper-parameters like learning rates, can be combined with supervised tasks to regularize supervised meta-learning, and leads to accurate few-shot learning on a diverse set of NLP classification tasks. We further explore the space of self-supervised tasks for meta-learning by considering important aspects like task diversity, difficulty, type, domain, and curriculum, and investigate how they affect meta-learning performance. Our analysis shows that all these factors meaningfully alter the task distribution, some inducing significant improvements in downstream few-shot accuracy of the meta-learned models. Our findings yield accurate and efficient meta-learning methods that improve few-shot generalization to diverse tasks and should enable many future applications of meta-learning in NLP, such as hyper-parameter optimization, continual learning, efficient learning, learning in low-resource languages, and more

ScholarWorks@UMass Amherst

A Moment in the Sun: Solar Nowcasting from Multispectral Satellite Data using Self-Supervised Learning

Author: Bansal Akansha Singh
Bansal Trapit
Irwin David
Publication venue: ScholarWorks@UMass Amherst
Publication date: 27/12/2021
Field of study

ABSTRACT Solar energy is now the cheapest form of electricity in history. Unfortunately, signi.cantly increasing the electric grid’s fraction of solar energy remains challenging due to its variability, which makes balancing electricity’s supply and demand more di.cult. While thermal generators’ ramp rate—the maximum rate at which they can change their energy generation—is .nite, solar energy’s ramp rate is essentially in.nite. Thus, accurate near-term solar forecasting, or nowcasting, is important to provide advance warnings to adjust thermal generator output in response to variations in solar generation to ensure a balanced supply and demand. To address the problem, this paper develops a general model for solar nowcasting from abundant and readily available multispectral satellite data using self-supervised learning. Speci.cally, we develop deep auto-regressive models using convolutional neural networks (CNN) and long short-term memory networks (LSTM) that are globally trained across multiple locations to predict raw future observations of the spatio-temporal spectral data collected by the recently launched GOES-R series of satellites. Our model estimates a location’s near-term future solar irradiance based on satellite observations, which we feed to a regression model trained on smaller site-speci.c solar data to provide near-term solar photovoltaic (PV) forecasts that account for site-speci.c characteristics. We evaluate our approach for di.erent coverage areas and forecast horizons across 25 solar sites and show that it yields errors close to that of a model using ground-truth observations

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Simultaneously Linking Entities and Extracting Relations from Biomedical Text Without Mention-level Supervision

Author: Bansal Trapit
Choudhary Neha
McCallum Andrew
Verga Pat
Publication venue
Publication date: 02/12/2019
Field of study

Understanding the meaning of text often involves reasoning about entities and their relationships. This requires identifying textual mentions of entities, linking them to a canonical concept, and discerning their relationships. These tasks are nearly always viewed as separate components within a pipeline, each requiring a distinct model and training data. While relation extraction can often be trained with readily available weak or distant supervision, entity linkers typically require expensive mention-level supervision -- which is not available in many domains. Instead, we propose a model which is trained to simultaneously produce entity linking and relation decisions while requiring no mention-level annotations. This approach avoids cascading errors that arise from pipelined methods and more accurately predicts entity relationships from text. We show that our model outperforms a state-of-the art entity linking and relation extraction pipeline on two biomedical datasets and can drastically improve the overall recall of the system.Comment: Accepted in AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence

Author: Bansal Trapit
Bhattacharyya Chiranjib
Das Mrinal
Tholpadi Goutham
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 09/02/2015
Field of study

Commenting is a popular facility provided by news sites. Analyzing such user-generated content has recently attracted research interest. However, in multilingual societies such as India, analyzing such user-generated content is hard due to several reasons: (1) There are more than 20 official languages but linguistic resources are available mainly for Hindi. It is observed that people frequently use romanized text as it is easy and quick using an English keyboard, resulting in multi-glyphic comments, where the texts are in the same language but in different scripts. Such romanized texts are almost unexplored in machine learning so far. (2) In many cases, comments are made on a specific part of the article rather than the topic of the entire article. Off-the-shelf methods such as correspondence LDA are insufficient to model such relationships between articles and comments. In this paper, we extend the notion of correspondence to model multi-lingual, multi-script, and inter-lingual topics in a unified probabilistic model called the Multi-glyphic Correspondence Topic Model (MCTM). Using several metrics, we verify our approach and show that it improves over the state-of-the-art

Association for the Advancement of Artificial Intelligence: AAAI Publications